A Player-like Agent Reinforcement Learning Method For Automatic Evaluation 
of Game Map 


Game map is an important human-computer interactive content-bearing platform in major games. With the application of cellular 
automata(CA) and Procedural Content Generation (PCG)in map generation, the spatial scale and data volume of current game 
maps are increasing greatly, while in game map test procedure, automatic methods such as interactive test script are inadequate 
both in depth and application breadth, especially in the lack of game map evaluation from player experience perspective. This 
research proposes an automatic game map test method based on agent reinforcement learning. By establishing agents’ interactive 
action models standing for different types of players’ behaviors in the map, universal evaluation of the map environment is 
enhanced through agent actions, which can optimize game map design from the perspective of player experience with 
quantitative value of inferiority. Finally, our campus scenes in Minecraft were used as the experimental environments to verify 


the effectiveness of the method. 
CCS CONCEPTS ° Games and Play * Computational Interaction 


Additional Keywords and Phrases: Games/Play ; Machine Learning ; Programming/Development Support ; Artifact 
or System ; Method ; Theory ; Application Instrumentation / Usage Logs ; Quantitative Methods ; Usability Study 


ACM Reference Format: 


[1] INTRODUCTION 


This paper introduce an modified reinforcement learning model to solve the problem of evaluating game maps 
from the perspective of player experience with player-like agents. As the demand for qualified ever larger game 
map than before in immense games rises,the research of game map testing, especially the automatic testing from 
the perspective of player experience, now deserves additional attention. 

1) In recent years, the magnitude and complexity of modern game maps are exploding with the assistance of 
PCG(Procedural Content Generation) methods. For example, Assassin's creed, a AAA game, is constantly updated 
over the past 10 years with the exponential growth of game maps as 1700 times from 0.13 square kilometers in 


Damascus to 230 square kilometers in the North Sea area of Europe, 
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Figure 1: Evolution of game map 


Automatic procedural modeling of game map(mainly PCG) has always been an academic research frontier for over 
thirty years, resulting in high-quality procedures for specific game map features of any types[1], such as 
landscapes[2, 3] , rivers[4-6] , plant models[7] and vegetation distribution [8], road networks[9] , urban 
environments[10], and building facades[11-13]. [1]introduced declarative modeling of virtual worlds that 
combines the integrated use of various procedural modeling techniques with a semantics-driven model to capture 
designer’s intent. [14] enables the construction of a 3- dimension buildings using the grown building footprints by 
L-system for amateur players.to create custom game content. These PCG achievements have formed engineering 
application tools such as Houdini, which can produce massive and large-scale game maps quickly . 
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Figure. 1 An overview of the workflow of procedural game map generation and ob jective consistency 
maintenance[15-17] 

And many studies emerged on ensuring the consistency and compatibility of various elements of the PCG map 
objectively. [18]proposed a shader-based system for real-time integration of Geographic Information Systems 
(GIS) vector features, such as road and rivers, into a DEM.[19]presented an interactive simulation system for cities 
growing over time, by expanding streets in the city’s road network. A dynamic system that connects geometrical 
with behavioral modeling is also proposed. [20]applied evolutionary and other metaheuristic search algorithms to 
automatically generating content for games, both digital and nondigital (such as board games). Despite 
[15]discussed the consistency of all generated content of various procedural models that it goes far beyond the 
internals of individual procedural methods. These objective verification methods(also called static test), which are 
often implemented by automated test scripts, are testing generated game map according to some computable 
criteria (e.g., Is there a path between the entrance and exit of the dungeon[24]., or does the tree have proportions 
within a certain range?or a fully automated process using image processing techniques to compare and judge 
examples[21].). If the test fails, all or some of the candidate game map is discarded and regenerated, and this 
process continues until the content is good enough[20]. 

2) While subjective game map testing is failed to match the level of automatic generation of game map. 

The subjective evaluation(also called dynamic test) for game map may be a human observer who specifies 
which individual players survive in each map generation[22, 23]. Two traditional types of manual subjective 
testing methods to game map: public testing and internal testing hindered PCG applications in game industry. 
Public testing has high efficiency in testing the scope of game content (not only including maps), but it needs high 
manpower by advertisement and other costs to promote the participation of the public. Internal testing by the 
internal personnel of the game company is not enough to cover a big map range. On the other hand, internal testers 
cannot evaluate the experience of map design schemes on behalf of public players, This is also the reason for the 
public beta of MMO games in recent years, such as World of Warcraft and JX Online 3. 


3) Two possible solutions do not solve the problem of subjective test of game map so far. 

One possible solution is to involve human players in the PCG rather than in the testing process afterward. In 
PCG DESIGN METAPHORS, The PLAYER EXPERT[25]is supposed to encompass any analysis, 
interpretation, and adaptation suggestions specifically related to player experience in any 
use of PCG that uses player behaviour and experience as input. Kazmi and Palmer[26] describe 
a system, embodying both a PLAYER EXPERT and a DESIGNER, premised on analysing and 
interpreting player actions in terms of player skill and style. [27]proposed an interactive 


process between the player and the computer which allows the player to guide evolving equations by observing 
results and providing aesthetic information at each step of the procedural models and achieving flexible 
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complexity .This kind of solution slows down the entire map generation process and requires player EXPERTs 
having considerable knowledge of PCG. Moreover, a few player EXPERTs involved in the PCG cannot represent all 
public players and cannot validate game maps by themselves, expensive public testing are still the most reliable 
game map testing methods so far. 

The other possible solution is to make objective automatic tests more subjective that is to empower 


automated test scripts with prior knowledge so that these artificial players [28-30] have evaluation abilities 
closer to the human player 's experience of game map than before without efficiency losing or cost increasing. 
Game-playing agents are beneficial in play testing by reducing costs and the need for human play testers [31] .Such 
as MCTS and reinforcement learning (RL) models can present automated play testing methods without the need of 
human player intervention. AI agents have been found useful in finding bugs[32], game parameter tuning [33].The 
results in behaviors of RL agents more closely resembling those of human players than traditional Objective 
verification methods, thus increasing the probability of finding bugs and exploits. Recent techniques have tackled 
these kinds of scenarios using either a single model learning the dynamics of the whole game [34], or two models 
focusing on specific domains respectively (navigation and combat) [35] .Devlin et al. showed how observations of 
human play data can be used to bias MCTS to play the card game Spades [36]. They use a relative entropy measure 
to assess the similarity of playing styles to traces of human players. Zook et al. limited the computational resources 
of MCTS to simulate player skill for a number of games [37] and similar fifindings were reported by Nelson [38]. 
Another approach to biasing the MCTS search process to be more similar to human players is described by Khalifa 
et al. [39].Christoffer Holmgard et al. [40]bias the MCTS using evolution applying designer-defifined utility to 
produce a set of personas that show what different playstyles might look like in MiniDungeons 2. [41 ]introduced a 
self-learning mechanism to the FPS type game testing,in which the required sum of game frames to reach a certain 
percentage of max reward(the agent well-trained) are regard as an quantitative indicator of the difficulty of the 
game environments.The shortcoming of previous studies is the behaviors of these agents are not directly adhere to 
the behavior of real players, but reinforced by reward guidance under different navigation targets. These navigation 
targets are not as same as the goals of players’ interaction in the game map, and the training environments or 
methods are simplified in different degrees compared with the real games. 
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Therefore, our work advances the state of automatic game map testing in model-based reinforcement learning with 
player-like agents. The workflow of this research is shown as figure: 


1) Game Player Behavior analysis and Clustering for game map testing. 
2) Constructing Player-like Experience Evaluation Model of Game Map. 
3) Modifying a Model-based reinforcement learning for game map evaluation. 


4) Experiments in Minecraft map testing 


[2] GAME PLAYER BEHAVIOR CLUSTERING FOR GAME MAP TESTING (RR) 


In our research, player behavior model focusing on the automated test of game 
map exclusively , and then to establish a map related behavior MCT (Monte Carlo 
Tree) model which can drive an artificial agent as a policy function. 

Human player behaviors in any games can be regarded as a sequential decision- 
making, A Markov decision process ( MDP ) represents a formal framework to 
describe such a process. Modeling the possible interaction between an arbitrary 
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agent and its environment over time. The MDP method requires the human player 
behavior model to accurately define various states and various direct actions 
Generally, Monte Carlo Tree Search ( MCTS ) is an alternative method to solve 
MDPs. It estimates the optimal action by building a tree of possible future 

( game ) states and rewards, with each tree node corresponding to the state 
resulting from an explored action[29]. Obviously, for different human players 
and different game types, the structure of Monte Carlo tree could be very 
different. 
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Map related player behaviors has two characteristics[40], 1 ) widely existed; 2 
) elementary interactions. First. [42]According to Bakkes, player behavior 
modeling can be divided into four levels, player analysis, strategic level, 
tactical level, action level. This behavior system migrated from the field of 
military command can be found in various games, as shown in the above figure. 
Among the four different behavioral levels of various games, map interaction is 
indispensable. Secondly, the basic element characteristics of map interaction 
behavior are also obvious. As a typical discrete space area, game map can 
support a limited number of player behavior, including spatial dimension 
switching, speed switching and switching frequency change. [43] . Pure map 
interaction elements are simple and identical in the analysis of game 
interaction in various studies[44], Discrete definitions of degree of freedom 
and moving distance in map space, such as upper and lower left and right 
movement and moving step size[45, 46], Further define spatial intersection, 
spatial aggregation, etc., to generate interactive behavior with other game 
interaction elements ( shooting target, treasure box ) . [43]The spatial 
position, movement speed and current direction of the avatar are the game map 
related player behaviors without any other social properties such as level, 
health, strength or any attractions in game scene such as rewards items, flags, 
monsters which diversified in different types of games. Among them , 
[44]Movement through virtual worlds is one of the primary mechanics in open- 
world (sandbox) games. in other words, and MoveDistance is a player behavior 
highly related to the desire of game exploration[47]. [48]proposed Landmarks 
are usually used by players for pathfinding. Each type of players having 
specific moving pattern (spatial decision tree) about transition probabilities 
between landmarks. 
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Thus, we propose Pure Spatial Monte Carlo tree(PSMCT)as the basic framework of 
the map test behavior model, as shown in the following figure. The basic map 
interaction elements are closer to the game character *s own space roaming 
capabilities, and are also more in line with the player interaction behavior 
model purely for game map automatic testing purposes. , PSMCT only contains the 
definition of the basic interactive elements and basic states of the game map. 


[3] PLAYER-LIKE EXPERIENCE EVALUATION MODEL OF GAME MAP 


From the perspective of human-computer interaction, the game experience of game 
players is a very personalized and comprehensive concept, which contains rich 
elements from arousal of endogenous emotion[49] to engagement with external 
player game duration and game frequency expression[29].Most of previous 
researches rely upon the assumption that player emotions can be inferred via 
the association of player self-reports (Subjective player experience 

modeling, SPED) and game context variables (objective player experience 
modeling, OPED) [50, 51]. But there is usually significant experimental noise 
in SPED which may be caused by player learning and self-deception effects 
Either, self reports in SPED can be intrusive if questionnaire items are 
injected during the gameplay sessions While afterward questionnaire items are 
suffering from minimal post-experience effects[52-54], The objective PEM 
approach can be model-based or modelfree. Model-based refers to emotional 
models derived from emotion theories (e.g., cognitive appraisal theory 

[55], usability theory [56], belief-desire-intention model, the cognitive theory 
by Ortony, Clore, & Collins [57], Skinner’ s model[58], Scherer’ s theory 
[59]), but there are also theories about player affect that are specific to 
games, such as Malone’ s design components for fun games[60], Koster’ s theory 
of fun[61], and game-specific interpretations of Csikszentmihalyi’ s concept of 
Flow[62]) such as the popular emotional dimensions of arousal and valence [63], 
[64] Model-free PEM refers to the construction of an unknown mapping (model) 


between modalities of player input and an emotional state representation via 
player annotated data[65], The key limitations of the OPEM approach include its 
high intrusiveness, low practicality (specific to games combined with high 
complexity), and questionable feasibility. 

Our study only selects the game map exploration within a limited time span as 
the objective evaluation indicator of game map experience. The reason is that 
game map exploration is the main basis of high-level game experience. Although 
the game experience metrics of OPEM and SPEM are quite different, the level of 
experience is recognized. Spatio-temporal features of game interaction (in our 
study, PCMCT) are usually mapped to levels of cognitive states such as 


attention, challenge, and engagement [51].and the player’ s cognitive 
processing patterns and cognitive focus may influence emotions (affective 
states: fun, challenge, frustration, predictability, anxiety, and 
boredom[66]). Ferro etc. [67]proposed Game Experience and Elements ( GEM ) 
framework . Through exploratory factor analysis ( EFA ) method, it is also 
determined that game map exploration is the basis of game experience and the 
most important cognitive element. 

In order to calculate and cooperate with the test agents of game map by 
traversal of PCMCT conveniently, this study proposes an exploration based game 
map experience function ( EBGMEF ). The calculation formula of the exploration 
degree of the game map is based on three assumptions:A. The game map is 
spatially uniformly discretized, such as a uniform hexagonal grid (as in 
Civilization 6, Total War etc.), a uniform quadrilateral or cube ( Flame 
Heraldic Series, Minecraft ). This assumption can decompose the overall 
experience value of the map into the sum of the experience values of each 
uniform discrete unit. B. game exploration is time-relative, influenced by the 
player ’s total game time. In previous researches about the degree of player 
involvement or fatigue, the important indicators like frequency of operation is 
also counted within a specified time[68, 69]. In game map test, utilizing a 


limited play time (may be defined by how long human player plays once at 
average ) to count the size of the explored map scope is clear and cohering to 
human player’ s feel. C. Game exploration is related to players °? memory. The 
experience of game map varies with players’ [70, 7llability of spatial memory 
and this spatial memory is the remaining value of players ° map-seeking, 
especially the instant impressions of foggy games ( such as Star Wars, age of 
empires ) after map exploration. Obviously considering the spatial memory 
ability of different players can better illustrate the experience value of game 
maps for different players than without it 


In the above formula of EBGMEF, a game map consists of n discrete units, each 
unit have an initial experience value of 1. When the player ’s agent roams to 
the current map unit for the first time, the quality value of the current map 
unit is calculated only once(that means exploration). k is the maximum number 
of memorized map units of a certain type of player, j indicates that the jth 
map unit on this memorized path, such as 10 map units can be remembered by a 
certain play-like agent, k is 10, j is in {0 ~ 9}, and k is determined by map 
memory rate Y and a memory threshold which does not appear in the above 
equation. A memory threshold are used to eliminate map units with little 
impression, for example if . memory threshold is set to be 0.0l,and Y is 
0.8,then after 20 map units,the remaining memory of the former 21 map unit is less 
than 0.01, then, the maximum remembered map units k is 20. Obviously, the farther 
the map unit is from the current unit ( i = 0), the less the spatial memory 
value is left( short-term memory characteristic of human beings [71]) 

Two points worth noting about the formula that, firstly the experience value of the 
discrete unit of the game map is non-renewable, and the experience calculation will not 
be carried out when the player-like agent passes again, which also conforms to the 
common sense of exploration as one time discovery, and the more frequent the player- 
like agent passes, the more boring the design of the game map is[69, 72]. Secondly 
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The total experience value of game map is positively correlative to the total 
time of agent exploration. Various Spatial Traversal Algorithms Based on Greedy 
Algorithm[24]can explore a map completely as long as there is enough time , it 
is obvious that the efficiency of time-related exploration of the map can 
better reflect the player-like experience of a game map. 


[4] MODEL-BASED REINFORCEMENT LEARNING FOR GAME MAP EVALUATION 


Model-free reinforcement learning (RL) can be used to learn effective policies 
for complex tasks with basic interactions between agents and the environment 
with reward rules for agents, such as AI played Atari games[73] from image 
observations of the running games. However, this typically requires very large 
amounts of interaction data, also takes a long computing process for agents to 
learn, such as[74]OpenAI 5 used about equivalent 10000 years of human ‘s game 
time to outperform the human world champions at an esports game Dota2. 

Model-based reinforcement learning (RL)can use the known behavior or 
environment models to set the action policies of the agents, conduct automatic 
learning in specific types of data enhancement, or shape the hidden space in 
the time domain with substantially improved efficiency by applying predefined 
models[75]. Using models of environments, or informally giving the agent ability 
to predict its future, has a fundamental appeal for reinforcement learning [75, 
76]. The spectrum of possible applications is vast, including learning policies 


from the model[77-82] , capturing important details of the scene [83], 
encouraging exploration[84], creating intrinsic motivation [85] or 
counterfactual reasoning [86] 
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Therefore, we propose a Model-based reinforcement learning based on PCMCT and 
EBGMEF, this reinforcement learning in this study is different from previous 
game reinforcement learning models. The uniqueness includes 

1 ) Action strategy player-like : The agent i’ s action strategy function 
comes from the fixed action strategy model ( PCMCT ) of a specific type i of 
players. The agent does not need training to improve the action strategy, and 
that ensuring the agent ’ s behavior close to the human player. 

2 ) Experience reward player-like: the reward Ri obtained by agent i’ s roaming 
action through EBGMEF reflects the human player ‘s experience of game map as 
the exploration memory, rather than as the stimulant for training agent’ s 
behavior. 

3) Spatial memory player-like : The value of map unit Qi comes from the direct 
action reward Ri of agent i with the spatial memory rate. If the Player i more 


proficient in playing game, the higher the spatial memory rate is[69, 72], this 
configuration ensures the experience assessment close to human player. 

4) Map total evaluation player-like : According to the above RL evaluation 
model of this study, the total value of identical game map varies with the 
agent types , and the total values of different game maps differentiate for the 
same agent type i, that is cohering to human players ‘ testing 


[5] EXPERIMENTS 


This study uses Minecraft as the test environment. First of all, Minecraft is 
popular in game research community with great potential for automated world map 
testing[87]. There are a large number of Minecraft maps shared on the network, 
including Hogwarts School of Magic, King ° s Landing of the Song of Ice and 
Fire, Berkeley University of California, Beijing University of Posts and 
Telecommunications and so on. It provides almost unlimited game map test 
resources. Secondly, Minecraft map automatic test calculation is simple. 
Minecraft map is a standard octree discrete space with uniform unit 
size[88], The roaming actions in Minecraft of the player is clear and the 
state-action calculation is simple; third, the total development workload of 
Minecraft map automatic iterative test is low. Microsoft has published Malmo 
reinforcement learning environment and open source its code , on which we 
modified to players-like reinforcement learning model in this study. 

The version of Malmo used in this study is 0.37.0, the program language used 
for rewriting is JAVA, the modification includes 3 steps : 

A. Extend the map base class of Malmo. First, the map unit has the initial 
exploration value ( represented by the red rose on the map block ) as a default 
attribute, and then each map unit saves its own experience value only once 
during an agent test when the agent first passes the map unit ; 

B. The map test agent is built with a inside PCMCT. Firstly, the PCMCT of the 
agent is consistent with the clustering results of the player survey in our 
experiment, representing the roaming behavior of a certain type of player. 
Secondly, the agent is responsible for maintaining the map units memory queue. 
The queue length depends on the memory rate and the forgetting threshold, for 
example if the memory rate is 0.8 and the forgetting threshold is 0.01. The map 
units after 20 map cells does not meet the forgetting threshold ( 0. 0821001 
<0.01), and the memory queue of the agent is set to 20 

C. Global test configuration is added. Includes current test map files, number 
of test agents, and test time. 

Besides the modification of Malmo, the emphasis of the experiment is to 
establish the player-like PCMCT. In this study, the transfer probability of map 
state and agent action in player PSMCT modeling is obtained by questionnaire, 
for player survey is more operable and universal in map testing task than 
others so far. Sharma et al. [89]proposed a higher-order classification of 
player modelling, in which as distinction is made between (1) direct- 
measurement approaches (e.g., that utilise biometric data) and (2) indirect- 
measurement approaches (e.g.,that infer the player’ s skill level from in-game 
observations). [90]analyzing game log data shows that experienced players often 
try more spatial choices in games. [9l]In a specific type of game scenario, the 
potential field of the game scenario is established through multiple statistics 
of player behavior, and then the AI agent is driven by the potential field 
gravity in different regions. Obviously, although multidimensional clustering 


and other methods[92]can effectively deal with the game behavior log data, the 
contents of log data of different game types are very different. For example 

the location of a treasure box or a monster, which is not necessarily existing 
in a PCG game map, will affect the player ‘s behavior. In summary, for game map 
test, it is difficult to guarantee the representativeness and universality of 
state-action learning by behavior log data acquisition or in game observation 
of any specific game. 

Therefore, we invited human players to answer the questionnaire as a Delphi 
method. The design of the questionnaire includes two types of questions, one 
category is the classification questions on player experience[93] ; the other 
is map state-action questions. In 2015 Rafet Sifa et al. [94] found that 
players’ game time determining the player behaviors as the dominant feature, 
through statistics of a large number of players’ data on the steam platform. 
But due to the differences caused by game types, their research does not 
involve the roaming behavior clustering in game maps. In the study of StABLE 
player behavior model proposed by Fragoso et al. [44], the advanced players and 
non-advanced players divided by game experiences show differentiation in 
playing behaviors ( interaction frequency, moving distance, etc.), and this 
differentiation has high stability in all scenarios.Referring to the above 
researches, our player experience classification problem takes the total game 
duration, number of games played and frequency of playing games as the criteria 
for player classification. 

Referring to the above PSMCT, our Minecraft game map state-action possibility 
assessment problems are processed in 3 steps, first is to define a variety of 
map states represented by representative landmarks in Minecraft , then 
investigate the player ° s state-action selection and action range (controls of 
agent speed). Finally, according to the answers, clustering the Minecraft map 
behavior data by different types of players and set up state-action functions 


of PSMCT by sampling probability. 
Table 1 Main survey elements 


Player classified variables 
Total Time of player Games How many hours to play 


How many kinds of games have played 
(limited to any game with interactive map ) 
Map State Front Obstacles, One Side L Obstacles, Two 


Sides U Obstacles, Front Step Up, Front Step 
Down, Wide Pavement, Three-Way Road 


The answers to the questions are based on the Likert Scale 5. The specific 


questions are as follows : 


1.how long is your total time about on playing game ? 


1.0-10hours 2.10-100hours 3.100- 4. 500-1000hours 5.  1000hours 
500hours DLE 


2.The number of games you played is about ? ( Only for any games with interactive maps ) 


2.6-10 3.11-20 4.20-50 5. more than 50 


3.How often do you play on average ? 


1. more than | 2. 30 days-half | 3. 8-30 days 4. 1-7 days 5. less than 
half year year 24hours 


4.How fast do you think your average behavioral response speed in the game is ? 


„iki 5. il 


5.When you decide to take an action, how much you think your action range will be ? 


2E 


6.When you roam the game map, how large the map range you can remember is ? 
5. EKI 
7.If you are in any game map, when you encounter obstacles in front of you, which action you 
will choose by default? 
1. RH 2. iE 3. Ae 4. EE 5. BEEK 

8.How fast do you think the expected reaction rate will be when you choose this action ? 
3. HE 5. aii 

9.What action range do you think the expected reaction will be when you choose this action ? 
2i Wien 

10.If you are in any game map, when you encounter unilateral L-shaped obstacles in front of 

you, which action you will choose by default? 


3. Fike 4, Fak 5. WER 
11.How fast do you think the expected reaction rate will be when you choose this action ? 
4, Heide 


12.What action range do you think the expected reaction will be when you choose this action 


? 

2J 5. RAIE 
13.If you are in any game map, when you encounter U - shaped obstacles on both sides ahead 
of you, which action you will choose by default? 


3. Ae Wi 5. WER 
14.How fast do you think the expected reaction rate will be when you choose this action ? 


15.What action range do you think the expected reaction will be when you choose this action 


? 

2/1. 5. RAHE 
16.If you are in any game map, when you encounter upward ladder or stairs in front of you, 
which action you will choose by default? ? 


1. Aint 2. ak 3, itt 5. BIER 


17.How fast do you think the expected reaction rate will be when you choose this action ? 


18.What action range do you think the expected reaction will be when you choose this action 


? 
2E 


19.If you are in any game map, when you encounter downward ladder or stairs in front of 


ou, which action you will choose by default? ? 


3. Ait 4, hike 5. WER 
20.How fast do you think the expected reaction rate will be when you choose this action ? 


21.What action range do you think the expected reaction will be when you choose this action 


? 

5. KE 

22.If you are in any game map, when you encounter open ground in front of you, which 
action you will choose by default? 


1. RMH 2. Fak 


2.46 


ie 


3. A$ 4. EE 5. WEER 
23.How fast do you think the expected reaction rate will be when you choose this action ? 


24.What action range do you think the expected reaction will be when you choose this action 


? 
2s) 


choose by default? 


ji 


25.If you are in any game map, when you encounter a three-way road, which action you will 


1. Random 2. backward 3. Turn right 4. Turn left 5.Take the 
middle 
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Participants for the survey were recruited at the Beijing University of Posts and 
Telecommunications in 2021.11; with only one ‘s age below 20 and varied player 
experience . About 3/4 of the participants were male, 1/4 female. 34 players 
participated in the session, and 29 validate answers returned with anomaly 
checking. Processed by SPSS, the total distribution of the answers show 
significant 3 clustering (figure 7. The all testers ° behaviors in game maps 
are classified by hierarchical clustering method according to their game 
experience. Even in one map state, the behavior choice has a significant 
relationship with the player type ( Figure 5,6 ). On the one hand, it verifies 
differentiation theory of player behavior in the references, and can also 
directly aid to establish three different PSMCTs. The state-action probability 
of each node on the tree are derived from the sampling probability of a certain 
type of player. The map memory rate comes from the arithmetic average of such 
kind players directly ( Problem 7 ). 

The test maps was selected from the Beijing University of Posts and 
Telecommunications in Minecraft into three game scenes with obvious appearance 
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differences but the same total number of map units. To avoid the birth point 
effect, all player-like agents are set to appear in the center of the scenes 


and the RL traversal time are set to a identical 10 minutes. 
BUPT CAMPUS MAP 


ver Shida North Road 


Test Area 2:Second Canteen 
Area(115 * 90 blocks 


peoy Buayonix 


Test Area 1:West School Gate 
Area (115 * 90 blocks) 


Test Area 3:Main Teaching Building 
Area(115 * 90 blocks) 


Figure 10: The two images in the first line show how the agent makes the z-axis unchanged action on the flat ground, and the last 
two show the process of the agent's jumping behavior under the unwalkable situation 


Player-like experience of | Test Area 1:West | Test Area 2:Second | Test Area 3:Main Teaching 
pee map School Gate Area Pe Area ie Area 


Agenti o 1 3250 | 56672 ttstC*d 2 2649 sstsi—i~sis*@Y 


1836 1652 2520 
1565 1166 


In the cross agent-map tests, the final Player-like experience of game map are generally different for each area 


and each agent type. 

From the player-like value table of game map obtained in the experiment, it can be seen that the value of all 
map areas for Agent 1 ( representing players with rich game experience ) is relatively high,and the main teaching 
building area ( appearance with the highest spatial complexity ) has the highest value comparatively. For Agent 2 
( representing players with general game experience ), the highest value of the spatial areas is Second Canteen 
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area, which has both small buildings and flats. The most valuable space map area for Agent 3 ( representing novice 
gamers ) is West School Gate area, which is totally flat. 


[6] DISCUSSION 


Aiming at endless PCG game maps testing for infinite playable value 
evaluation, this study presents a modified reinforcement learning model to 
utilize player-like agents to replace human players in map testing, which will 
greatly reduce the testing workload of human-computer interactions and time or 
financial cost. the contributions of this study include: 

(1) This study proposes a feasible definition of agent behavior for map 
testing from the perspective of player behavior modeling 
player behavior modeling based on game data and questionnaire survey has been 
studied in different specific games, but different game types make the models 
highly complicated. In fact, the agent behavior which is purely used to test 
the map does not need much complexity. Based on the principles of game space 
interaction design, this study proposes a special pure behavior tree structure 
for game map testing. It provides a unified player behavior model for the test 
of various game maps 

(2) From the perspective of player experience, this study proposes a map 

value definition model. 
Whether agents can obtain experience value in the game, and the convenience and 
magnitude of obtaining experience value indicate the design quality of game 
map. Previous studies have coupled specific game types, making it difficult to 
directly evaluate the design quality of game map without the experience value 
of other interaction elements. In this study, starting from the commonness of 
interactive experience of game maps, the overall design quality of spatial maps 
is decomposed into the cumulative quality of each grid, and the quality of each 
grid is decomposed into direct exploration and spatial memory according to the 
theory of game psychology. It provides a unified player experience evaluation 
model for the test of various game maps. 

(3) Based on the model-based reinforcement learning method, this study 
established a set of reinforcement learning models dedicated to map testing. 
The reinforcement learning model in this paper has following difference from 
previous studies 

1) The behavior of the agent itself in this study is not variable in the 
learning process, while the experience value of game map is variable in 
the iterative learning process. The purpose of reinforcement learning is 
to automatically enhance the accuracy and comprehensiveness of the 
experience value evaluation results of map. 

2) The player experience value of the map is obviously player-oriented. If 
the agent’ s behavior model is on behalf of different type of players, 
the experience value of the identical map is different 

3) Effective map evaluation and companion requires limitation on the count 
of agent actions. Maximizing experiential value is not the goal of 
training reinforcement learning model in this study. For agents with 
fixed player-like behavior patterns, unlimited count of actions will 
improve the experiential value of maps definitely, but this will bias 
the purpose of map testing. Effective map RL testing must take place in 
limited time or the count of agent actions in other words. 
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Figure 4: Schematic diagram of player-like agent Reinforcement Learning Method 


Through the reinforcement learning model proposed in this study, we can select different types of players to test 
the map automatically ( 3 types as shown above ). In the experiment of this study, evaluations of identical game 
map are different according to different types of players, and the total experience value of identical player-like 
agent to different map is different too, which can effectively evaluate and compare the interactive values of map 
designs from the perspective of target players of some types .Moreover, the difference between the types of players 
is more helpful for PCG designers to improve the existed game map, or generate new maps according to the target 
players of the game map. 

There are still two deficiencies in this study : 

Firstly, the proposed reinforcement learning model does not couple PCG in iteration for 
automatic game maps design. In this study, the evaluation of map is independent of the 
player-like agent ‘s testing behavior without any spatial structure alteration of the map 
itself. In the future, PCG game map design can be automatically and intelligently 
iteratively updated according to RLevaluation to maximum the value of player-like experience, 
which is also farther this study to the goal of artificial intelligence game design. 

Secondly, the experiment in this study is only carried out in Minecraft, in which the calculation of PSMCT and 
EBGMEF is quite simple compared with other AAA games. Minecraft map is a three-dimensional volume based 
on the classical octree.and the modeling behaviors of agents are simple too, and the overall calculation workload is 
much smaller than that of complex 3D maps. But the current mainstream games, especially AAA masterpieces 
have adopted high precision 3D maps, the player’s state-action mode is more complex than Minecraft. If migrating 
and promoting this study into other types of games , more researches needs to be carried out in the definition of 
state-action strategy, the map experience calculation and the test computing optimization. In particular, it is also 
necessary to study the fast extraction method of player‘s state-action model through game log data [28] [96] to 


replace the current independent and inefficient player questionnaire. 


[7] CONCLUSIONS 


From the review of the current literature,game map testing is difficult in matching PCG development with an 
automatic pattern. Objective and rapid automatic testing can only reflect some superficial indicators of game maps, 
and cannot evaluate the advantages and disadvantages of game maps from players’ perspectives. Game map testing 
still requires a lot of manual participation, raising the cost of the game industry. While much of the previous 
literature has focussed on the application of reinforcement learning in games, especially the solutions for AI agents 
to play various games, there have been few studies which can help game testing. 

In general,this study provides new idea and computational frameworks for automated tests of game maps.The 
contribution of this study is to presents a modified reinforcement learning model combining objective 


test and subjective test, ensuring the effectiveness of game map test results, including the proposed agent behavior 


tree model(PSMCT) and player experience evaluation function for map test(EBGMEF). In the Minecraft 
experiment, through player surveys, three types agents acting in three test scenes automatically evaluated and 
scored game maps with distinct player-like perspectives. The experimental results are more subjective than former 
automatic script map test methods, and more extensive map testing capabilities than some game-specific AI 
models. Moreover,there is scope for further researches which mix player-like AI test method and PCG method to 


realize iterative automatic game design, enabling both to co-evolve. 
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